Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning

نویسندگان

Seyed Sajad Mousavi

Michael Schukat

Peter Corcoran

Enda Howley

چکیده

Recent advances in combining deep neural network architectures with reinforcement learning techniques have shown promising potential results in solving complex control problems with high dimensional state and action spaces. Inspired by these successes, in this paper, we build two kinds of reinforcement learning algorithms: deep policy-gradient and value-function based agents which can predict the best possible traffic signal for a traffic intersection. At each time step, these adaptive traffic light control agents receive a snapshot of the current state of a graphical traffic simulator and produce control signals. The policy-gradient based agent maps its observation directly to the control signal, however the value-function based agent first estimates values for all legal control signals. The agent then selects the optimal control action with the highest value. Our methods show promising results in a traffic network simulated in the SUMO traffic simulator, without suffering from instability issues during the training process.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Off-policy model-free deep reinforcement learning methods using previously collected data can improve sample efficiency over on-policy policy gradient techniques. On the other hand, on-policy algorithms are often more stable and easier to use. This paper examines, both theoretically and empirically, approaches to merging onand off-policy updates for deep reinforcement learning. Theoretical resu...

متن کامل

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...

متن کامل

Deep Primal-Dual Reinforcement Learning: Accelerating Actor-Critic using Bellman Duality

We develop a parameterized Primal-Dual π Learning method based on deep neural networks for Markov decision process with large state space and off-policy reinforcement learning. In contrast to the popular Q-learning and actor-critic methods that are based on successive approximations to the nonlinear Bellman equation, our method makes primal-dual updates to the policy and value functions utilizi...

متن کامل

Learning Road Traffic Control: Towards Practical Traffic Control Using Policy Gradients Diplomarbeit

The optimal control of traffic lights in urban road networks is a highly complex problem. Many factors influence the flow of traffic, and hence the performance of a traffic network, of which few can readily be measured. Currently used control systems are often relatively simple and date back several decades, while more sophisticated optimisation methods fail for large networks. Reinforcement le...

متن کامل

Using Deep Q-Learning to Control Optimization Hyperparameters

We present a novel definition of the reinforcement learning state, actions and reward function that allows a deep Q-network (DQN) to learn to control an optimization hyperparameter. Using Q-learning with experience replay, we train two DQNs to accept a state representation of an objective function as input and output the expected discounted return of rewards, or q-values, connected to the actio...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

CoRR

دوره abs/1704.08883 شماره

صفحات -

تاریخ انتشار 2017

Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning

نویسندگان

چکیده

منابع مشابه

Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

Deep Primal-Dual Reinforcement Learning: Accelerating Actor-Critic using Bellman Duality

Learning Road Traffic Control: Towards Practical Traffic Control Using Policy Gradients Diplomarbeit

Using Deep Q-Learning to Control Optimization Hyperparameters

عنوان ژورنال:

اشتراک گذاری